Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems
نویسندگان
چکیده
This paper presents a novel algorithm for learning parameters in statistical dialogue systems which are modelled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy which selects the system’s responses based on the inferred state; and a reward function which specifies the desired behaviour of the system. Ideally both the model parameters and the policy would be designed to maximise the reward function. However, whilst there are many techniques available for learning the optimal policy, there are no good ways of learning the optimal model parameters that scale to real-world dialogue systems. The Natural Belief-Critic (NBC) algorithm presented in this paper is a policy gradient method which offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected reward. The resulting gradient is then used to adapt the prior distribution of the dialogue model parameters. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximise the reward function result in significantly improved performance compared to the baseline handcrafted parameters.
منابع مشابه
Reinforcement learning for parameter estimation in statistical spoken dialogue systems
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system’s responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the exp...
متن کاملStatistical methods for spoken dialogue management
Statistical methods for spoken dialogue management Blaise Thomson Spoken dialogue systems provide a mechanism for interacting with computers that is both natural and effective for human use. This thesis describes a practical framework for building these systems based on the Partially Observable Markov Decision Process (POMDP). The underlying belief state is represented by a dynamic Bayesian Net...
متن کاملBayesian update of dialogue state: A POMDP framework for spoken dialogue systems
This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intrac...
متن کاملKeynote: Statistical Approaches to Open-domain Spoken Dialogue Systems
In contrast to traditional rule-based approaches to building spoken dialogue systems, recent research has shown that it is possible to implement all of the required functionality using statistical models trained using a combination of supervised learning and reinforcement learning. This approach to spoken dialogue is based on the mathematics of partially observable Markov decision processes (PO...
متن کاملSample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces
In Statistical Dialogue Systems, we aim to deploy Artificial Intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this project, we inve...
متن کامل